Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

core.experimental.memory module with D replacement for memcpy added#2687

Closed
baziotis wants to merge 13 commits intodlang:masterfrom
baziotis:Dmemcpy
Closed

core.experimental.memory module with D replacement for memcpy added#2687
baziotis wants to merge 13 commits intodlang:masterfrom
baziotis:Dmemcpy

Conversation

@baziotis
Copy link
Contributor

This is a D replacement for the C Standard Library memcpy. It provides 2 interfaces:

  • The "D" one, which takes, one of ref, dynamic arrays or static arrays.
  • The "C" one, via the name Dmemcpy (for disambiguation with the libc), which takes the classic
    (void*, void*, size_t).

Since this function uses SIMD, I found convenient to PR a simple SIMD library, which I hope will be extended by the community. Its main purpose is to provide cross-compilation intrinsics for D.

Ideally, the core.experimental.memory.simd would be a different PR, but without it basically a PR for memcpy can't happen.

The change in core.internal.traits is the same done here: #2662
I have copied everything necessary from std.traits to have isArray available.

There are a couple things I'm not sure about but are minor, so better wait for feedback.

@@ -0,0 +1,170 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in core.experimental.memory? Wouldn't core.experimental make more sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how you look at it. It was proposed here: #2671 (comment)
Which makes sense if the SIMD operations are only memory operations (which they are currently). Eventually, I hope that this will become a bigger SIMD library and be moved out to experimental (or wherever) because D needs one that is able to cross-compile. At least I don't know of one and so I made this.

@lesderid
Copy link
Contributor

Ideally, the core.experimental.memory.simd would be a different PR, but without it basically a PR for memcpy can't happen.

Perhaps make two PRs but include the SIMD commits in the memcpy one too?

@baziotis
Copy link
Contributor Author

Perhaps make two PRs but include the SIMD commits in the memcpy one too?

Yes, that's an alternative. But then if changes happen in this or the other PR, then the changes have to be made on both PRs. Ideally, first the PR on SIMD would be made, then this, but it was done to save time. It makes some sense to be here, as the internal.traits, since besides the code, we see a use case. I'll leave it an open discussion for a little time, thanks for the feedback. :)

Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clean up your notes and TODOs


/* Handle dynamic sizes. `d` and `s` must not overlap.
*/
void Dmemcpy(void* d, const(void)* s, size_t n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is public because of the description of this PR:

This is a D replacement for the C Standard Library memcpy. It provides 2 interfaces:

The "D" one, which takes, one of ref, dynamic arrays or static arrays.
The "C" one, via the name Dmemcpy (for disambiguation with the libc), which takes the classic: (void*, void*, size_t).

So, because the idea was to provide the C interface as well. Is this desired?
Otherwise, yes it should be private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the C version then? I think all bases should be covered with static arrays and dynamic arrays, you can always slice a pointer to create a dynamic array from it and a length..

Copy link
Contributor Author

@baziotis baziotis Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the C version (meaning, with the C interface). It is named Dmemcpy for disambiguation with the libc memcpy.
We changed the Dmemcpy name to memcpy for the D interface, which provides 3 overloads, the ones you can see on top of the file here, named memcpy. I guess this was done with the purpose that they are overloads for the libc memcpy. They provide handling for refs, static arrays and dynamic arrays.
But I thought it would be good to also provide a C interface but for the D version (aka the druntime version) named Dmemcpy, that is different from the libc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not supported on all platforms.

Where is it not supported?

D just doesn't have a standard for it... but that's very possible to fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is it not supported?

AFAIK MSVC (or whatever object format that Windows uses, COFF/PE?) doesn't support weak linkage, but I might be wrong.

D just doesn't have a standard for it... but that's very possible to fix.

I agree, would be great to have this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I might be wrong.

Yeah. I'm pretty sure weak linkage is available everywhere, and it might be particularly useful in this project.

I wonder if there's any reason that we don't have a weak attribute already? I think I've asked for it in the past and it was denied because reasons... I wonder what they were :/

Copy link
Contributor

@wilzbach wilzbach Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function should still be private or have a clear confusion free name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm pretty sure weak linkage is available everywhere

The accepted answer and the most up-voted answer on StackOverflow[1] suggest that at least the semantics are different with MSVC. Other compilers/linkers might implement workarounds though, I don't have access to a Windows machine to test on.

[1] https://stackoverflow.com/questions/2290587/gcc-style-weak-linking-in-visual-studio?answertab=votes#tab-top

@thewilsonator
Copy link
Contributor

make[1]: *** No rule to make target '/dev/shm/dtest/work/repo/dlang.org/web/phobos-prerelease/core_experimental_memory_memcpy.html', needed by 'doc'. Stop.

I think you need to add this to the doc makefile.

@baziotis
Copy link
Contributor Author

Isn't the doc makefile the mak/DOCS?

@lesderid
Copy link
Contributor

Isn't the doc makefile the mak/DOCS?

Yes, but new (sub)directories have to be added in posix.mak too: https://github.com/dlang/druntime/blob/master/posix.mak#L157.

@baziotis
Copy link
Contributor Author

Thanks Les!

const(void)* s = cast(const(void)*) src.ptr;
size_t n = dst.length * typeof(dst[0]).sizeof;
// Assume that there is no overlap.
pragma(inline, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What? It shows me 4 lines. Please point me to the line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one I tagged... pragma(inline, true);
That's a weird statement, what does it mean? What does it apply to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, pragma(inline, true) instructs the compiler to inline the function call that follows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a stand-alone statement on its own hanging there? What does it apply to?

Can you show the doco where you read that? If that exists I'll be very surprised (I've been asking for it for many years).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has effect, but not the effect I said. My mistake: https://dlang.org/spec/pragma.html#inline
It inlines the function it is enclosed by.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow. I had no idea you could do that. That's very weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to me as well..

void memcpy(T, size_t len)(ref T[len] dst, ref const T[len] src)
{
T[] d = dst[0 .. $];
const T[] s = src[0 .. $];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why write these on separate lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d and s? How else could they be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy(dst[0 .. $], src[0 .. $]);

I only say because I stopped and re-read these lines carefully looking for the gotcha I assumed must exist to justify this long-written version. To me, writing it out this way makes it look like there's unexpected detail that should be distinctly stated... maybe that's just me though. I was looking for the substance in the noise, and there wasn't any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. And yes, there's no gotcha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that writing it in one line expresses the intent clearer.

{
ubyte* dst = cast(ubyte*) d;
const(ubyte)* src = cast(const(ubyte)*) s;
for (size_t i = 0; i != n; ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this at least moving ulong's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment made me think, thanks.

I could use the modified GNU algorithm: https://www.embedded.com/print/4024961
but to me this only complicates the code without any real gain. The reason is that it only tests if they're aligned and doesn't reach alignment. To do a more serious 4-byte or 8-byte move, one has reach alignment etc. like this: https://github.com/dlang/druntime/pull/2662/files#diff-b2f13b9afdcf6d39739b2bf55035f8e2R166

The reason I did not do either of those is that the compiler can do this and other things itself. It should be able to understand that moving either 4 or 8 bytes at a time is faster, reach alignment, even add SIMD. For example
LDC: https://godbolt.org/z/2V1Wl8
GDC: https://godbolt.org/z/XXzLUQ (GDC needs -O3 to do the crazy stuff).

Here, the compiler does all of the above. It even added something like a switch.

But, here's the part I hadn't tested. If on LDC we pass the -disable-loop-vectorization
LDC: https://godbolt.org/z/WVVJlz
or we specify an architecture that does not support SIMD, like let's say pentium2.
LDC: https://godbolt.org/z/Q1vqOS
then it doesn't do 4-byte moves (It should be 4 bytes there since it is 32-bit compilation).

The same is true for GDC: https://godbolt.org/z/PK8YDQ

So, I don't know if it pays to reach alignment but definetely I will use the GNU algorithm. It adds value without complicating the code much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be depending on optimisations that DMD can't do.
And LDC/GDC should be using intrinsics exclusively, definitely not calling any of these functions. Taking that knowledge away from the optimiser may create some very lame copies in optimised code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DMD was the initial focus. Then, it changed LDC and GDC so pretty much, what DMD does became irrelevant. Otherwise, the code would be way different. This is not a hypothesis. I spent one month writing DMD versions of these functions that had to reach the libc ones. They were completely different from the current ones due to the optimizations that DMD could not do.

Meaning, if LDC and GDC use their intrinsics, this project becomes irrelevant. But to use their intrinsics, the above question should be answered, which is what LDC and GDC do when libc is not available.

For me, the logical move is to inform the optimizer that e.g. our memset is supposed to do what memset does. Which enables us to write our own versions of these functions (aka libc independency) and not lose the optimization info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is what LDC and GDC do when libc is not available.

There's a difference between a call to a function called mempy (which is hooked by an intrinsic), and a symbol called memcpy in a lib somewhere. You're writing the latter, but the former is what appears in the code. LLVM/GCC intrinsics don't always emit a call to a memcpy function, they can do anything they like. They might emit a call to a memcpy function, and if they do, it would be an advantage to understand in what context that can occur.

I don't know the answer here; I haven't looked into this, it might be that they just call the libc memcpy when they decide not to do anything special; or they might call into their own small runtime libraries? Perhaps their runtime libraries have a small suite of such functions optimised for different cases? I don't know, but anything's possible here.
All the compilers have a runtime library which is a lower-level library than libc. I'm not sure if those libraries contain memcpy/memset's for when the intrinsics bail out... or if they all just leave it to the libc implementation? If the intrinsic's fall-back functions are in the runtime library and not taken from libc, then we already have libc independence in those cases.

Anyway, I can see a world where it may be useful to have these functions for when the intrinsic does bail-out, but I think it's important to understand the implementation details of each compilers intrinsics to understand what they expect from these functions and the conditions when they're called.

You'll never be able to write optimised implementations for all platforms (or even x86, because it's a moving target); so I feel like the most interesting victory here is for DMD.

I'm sorry, I don't really know anything about this project you're working on, or the parameters for completion... but I think tackling memcpy/memset are probably lower priority when considering libc independence, and possibly the 2 most highly contentious items on the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, if LDC and GDC use their intrinsics, this project becomes irrelevant.

Just to be clear; we should assume they are, and do absolutely everything we must to make sure we don't interfere with that. The case for us to consider here is when the intrinsics decide not to do anything special and fall-back to a lib-call... what functions do they call when they bail out? Is it libc memcpy/set, or something else customised for their runtimes?

Copy link
Member

@ibuclaw ibuclaw Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Musl may be a good reference too, they have optimized C versions, that simply translate in D (and gdc/ldc would generate equivalent code to C).

https://github.com/bminor/musl/blob/master/src/string/memcpy.c
https://github.com/bminor/musl/blob/master/src/string/memset.c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TurkeyMan I agree with everything you wrote. You see, when this project started (or when it moved to LDC / GDC), I didn't know any of these proeblems. It was Johan's comment (about a week ago) that made me realize "Oh, not using the intrinsics is like a really bad thing". I'm not sure how to move forward..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibuclaw Yes, they're good reference, thanks. It was one of the early codes I was referred to. Generally though, Agner's implementations are supposed to be the fastest around (although that's not exactly true compared to the libc. This could be a very long discussion). He's considered a master in this field.

Other than that, unfortunately, I don't think that the main problems are with the code but with the goals.

Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • move traits in a separate PR into druntime, but with all tests. Then you can replace their Phobos implementation with a reference (we have switched to this paradigm a while ago. Applies to the old traits as well, but no one got around doing that, but at least no stuff shouldn't be duplicated)
  • lots of easy style improvements (foreach, one empty line between function)
  • attributes for unittests (especially the new @betterC)
  • docs seem to be a bit outdated and comments in general need polishing. All Stefanos notes should ideally be resolved or converted in a neutral comment

testStaticType!(S!32768);
testStaticType!(S!65536);
}
pragma(inline, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove. It doesn't do anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

testStaticType!(ulong);
testStaticType!(float);
testStaticType!(double);
testStaticType!(real);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use static foreach for types too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean with AliasSeq probably. The static foreach is a problem for GDC though. Do you want 2 versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, that makes sense. However, this module won't be backported to dmd-cxx, so you don't need to worry about backwards compatibility for older versions of D.

testStaticType!(S!8192);
testStaticType!(S!16384);
testStaticType!(S!32768);
testStaticType!(S!65536);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foreach!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the GDC thing.


/* Handle dynamic sizes. `d` and `s` must not overlap.
*/
void Dmemcpy(void* d, const(void)* s, size_t n)
Copy link
Contributor

@wilzbach wilzbach Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function should still be private or have a clear confusion free name.

static string loop(string prefetchChoice)()
{
return
"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use token strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find an easy way to interpolate the prefetchChoice with token strings.

Copy link
Contributor Author

@baziotis baziotis Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't somehow answer on the comment above, so:

Function should still be private or have a clear confusion free name

If it is private, then the whole discussion about providing a C-like has no meaning.
So, if you don't want a C-like interface then I make it private otherwise, what would be a confusion-free name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't find an easy way to interpolate the prefetchChoice with token strings.

q{ ...} ~ ...

* N.B.: Both the memcpy() here and Dmemcpy() return nothing, contrary to the C Standard
* Library version.
* Source: $(DRUNTIMESRC core/experimental/memory/memcpy.d)
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DStyle is without leading stars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get where to use the one and where the other.
For example, with leading stars: https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L277
Without leading stars:
https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L213

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not sure, you can check the D Style document[1], which should always be followed for new code: "Documentation comments should not have leading stars on each line."

A lot of the code in druntime (and sadly also some in Phobos) is old and doesn't follow the style very closely or at all.

[1] https://dlang.org/dstyle.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm trying to follow but things escape here and there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there's lots of old code here and in Phobos that was never changed, but at least new code should follow the guidelines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ok


/**
* Handle Static Types
* N.B.: No need for more sophisticated code for static types. The compiler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use full sentences and no abbreviation. Have a look at std.algorithm for good examples on how to write good doc headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a note and I was referred to a N.B. So, Note?

import core.internal.traits : isArray;

/**
* Handle Static Types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This distinction isn't important to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean it should be just a // that is not part of the doc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant that as you have both overloads defined, this is an implementation detail. Though as said on another point there's a case to simplify the interface even further, but it definitely shouldn't expose such details to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an implementation detail
Yes, so the // wouldn't be the right choice i.e. not be part of the doc?
On the merging of the interfaces there were problems that I could not solve: #2687 (comment)

*/
pragma(inline, true)
void memcpy(T)(ref T dst, ref const T src)
if (!isArray!T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use this as the user facing API and delegate with static if? Reduces the exposed interfaces -> less confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because for me it's more confusing with one interface. I'll change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a problem that I think I had when I had tried to join in one interface all of them. Basically, somehow the arrays get confused. e.g. if I have this code:

void memcpy(T)(ref T dst, ref const T src) nothrow @nogc
{
    static if (!isArray!T)
    {
        dst = src;
    }
    else
    {
        void memcpyArray(T)(ref T[] dst, ref const T[] src) nothrow @nogc
        {
            assert(dst.length == src.length);
            void* d = cast(void*) dst.ptr;
            const(void)* s = cast(const(void)*) src.ptr;
            size_t n = dst.length * typeof(dst[0]).sizeof;
            // Assume that there is no overlap.
            Dmemcpy(d, s, n);
        }

        static if (isDynamicArray!T)
        {
            memcpyArray(dst, src);
        }
        else
        {
            memcpyArray(dst[0 .. $], src[0 .. $]);
        }
    }
}

void memcpy(T, size_t len)(ref T[len] dst, ref const T[len] src)
{
T[] d = dst[0 .. $];
const T[] s = src[0 .. $];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that writing it in one line expresses the intent clearer.

@baziotis
Copy link
Contributor Author

baziotis commented Aug 7, 2019

I have the one problem I mentioned in the comment about trying to merge all of the memcpy in one interface.

Also, this:

    memcpy(dst[0 .. $], src[0 .. $]);

produces a segfault.

Other than those:

move traits in a separate PR into druntime, but with all tests.

If I move them from here, then this won't work. Unless you mean keep them here but also make a different PR (and then a different PR to remove them from Phobos).

Also, I can't seem to make @betterC work. I think I rebased successfully but still adding @betterC to a unittest yields undefined identifier.

Edit: The closure was by accident.

@baziotis baziotis closed this Aug 7, 2019
@baziotis baziotis reopened this Aug 7, 2019
@JohanEngelen
Copy link
Contributor

A note about compiler intrinsics: LLVM has intrinsics for memset and memcpy, but one doesn't need to use them to get optimizations going. E.g. in code like this, no intrinsics are used:

extern(C) void *memset(void *ptr, int x, ulong n);
void foo(int* ptr) {
    memset(ptr, 0, 2);
}

yet the optimizer recognizes memset (by C name and signature) and is assuming it correctly knows all about its semantics. This is an aggressive optimization (recognizing by name) that some people may frown upon. See: https://d.godbolt.org/z/H-yeMW . It is similar to the optimizer recognizing a code pattern and turning that into a memset call, again assuming it correctly knows the semantics of the memset provided at link time. As far as I know, there is currently no way to tell the optimizer that a function has the same semantics as memset, other than naming that function exactly like that.
There are a host of functions that LLVM recognizes, with elaborate logic related to it, see e.g. https://github.com/llvm/llvm-project/blob/7f7ef0208b57a4d253cd8b07053460f40ad7cbc8/llvm/lib/Analysis/TargetLibraryInfo.cpp#L64-L67

(For the interested, the same works for allocation functions: see here how Rust adds to its LLVM fork the names of its own stdlib functions to the list of recognized memory functions: rust-lang/llvm-project@1f4c17d . Although for that, we can use the allocSize attribute https://github.com/ldc-developers/druntime/blob/ldc/src/ldc/attributes.d#L16-L56)

@baziotis
Copy link
Contributor Author

baziotis commented Aug 7, 2019

Thanks for the info Johan!

As I said in a previous comment, when I started the implementation for LDC, GDC, I didn't know that:
a) What memset, memcpy etc. do is info known to the compiler that is used during optimization.
b) That info is very important to have.

As I said in the comments above, sadly (very sadly, I have written this code one for DMD and one for LDC, GDC) that info pretty much makes this project irrelevant.
To be more specific:

  • Not using the intrinsics is quite bad. And since, as you said, there's no way to specifically inform the compiler that some code has the same semantics as e.g. memset, it's not worth it. And writing code on the assumption that the compiler will assume the same semantics on either the code or naming, seems bad to me.
  • The one question that I had was what these intrinsics forward to when there is no libc. But the thing is, this code is x86_64 specific. And probably no-one who cares about performance won't have the libc (AFAIK).

So, this project is worth it if:

  • We're targetting DMD. Although I don't think that the initial goal of reaching libc (and so, my then approach) was good because a) it requires inline ASM b) keeping up to date with x86_64 performance is difficult. But DMD could
    benefit from performant, but generic D implementations.
  • Even if we're not targetting DMD, a generic implementation would be candidate. An implementation that doesn't try to optimize for x86_64 because it probably won't ever be used there (as the libc will probably be available). That could be something like the musl implementation which is generic C.

So, well I don't know.. I'm open to your thoughts.

Btw

A note about compiler intrinsics: LLVM has intrinsics for memset and memcpy, but one doesn't need to use them to get optimizations going. E.g. in code like this, no intrinsics are used:

yet the optimizer recognizes memset (by C name and signature) and is assuming it correctly knows all about its semantics. This is an aggressive optimization (recognizing by name) that some people may frown upon. See: https://d.godbolt.org/z/H-yeMW . It is similar to the optimizer recognizing a code pattern and turning that into a memset call, again assuming it correctly knows the semantics of the memset provided at link time.
There are a host of functions that LLVM recognizes, with elaborate logic related to it, see e.g. https://github.com/llvm/llvm-project/blob/7f7ef0208b57a4d253cd8b07053460f40ad7cbc8/llvm/lib/Analysis/TargetLibraryInfo.cpp#L64-L67

Wow!

(For the interested, the same works for allocation functions: see here how Rust adds to its LLVM fork the names of its own stdlib functions to the list of recognized memory functions: rust-lang/llvm-project@1f4c17d . Although for that, we can use the allocSize attribute https://github.com/ldc-developers/druntime/blob/ldc/src/ldc/attributes.d#L16-L56)

Interersting

@baziotis
Copy link
Contributor Author

Closed due to not being useful - more info here.

@baziotis baziotis closed this Sep 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants